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Abstract 

Training-based transmission over Rayleigh block-fading multiple-input multiple-output (MIMO) channels is inves- 
tigated. As a training method a combination of a pilot-assisted scheme and a biased signaling scheme is considered. The 
achievable rate of a successive decoding (SD) receiver based on the linear minimum mean-squared error (LMMSE) 
channel estimation is analyzed in the large-system limit, by using the so-called replica method. It is shown that 
negligible pilot information is best in terms of the achievable rate of the SD receiver in the large-system limit. 
Moreover, the obtained analytical formula of the achievable rate can improve the existing lower bound for the 
capacity of the MIMO channel with no channel state information (CSI), derived by Hassibi and Hochwald, for all 
signal-to-noise ratios (SNRs), while there is a gap between the obtained lower bound and the channel capacity. Energy 
efficiency in the low SNR regime is also investigated in terms of the power per information bit required for reliable 
communication. The required minimum power is shown to be achieved at a positive rate for the SD receiver with 
no CSI, whereas it is achieved in the zero-rate limit for the case of perfect CSI available at the receiver. The results 
presented in this paper imply that SD schemes can provide a significant performance gain in the low-to-moderate 
SNR regimes, compared to conventional receivers based on one-shot channel estimation. 
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based transmission, linear minimum mean-squared error (LMMSE) channel estimation, successive decoding (SD), 
biased signaling, achievable rates, energy efficiency, large-system analysis, replica method. 

I. Introduction 

Multiple-input multiple-output (MIMO) transmission is a promising scheme for increasing the spectral efficiency 
of wireless communication systems, and has been apphed to several modern standards, such as wireless LAN (IEEE 
802. Un) and Mobile WiMAX (IEEE 802. 16e). However, the ultimate achievable rate of MIMO systems is not fully 
understood. Thus, it is an important issue in information theory to elucidate the channel capacity of MIMO systems. 

The capacity of MIMO channels with perfect channel state information (CSI) at the receiver was analyzed in the 
early pioneering works H], 121. Telatar ID proved that independent and identically distributed (i.i.d.) Gaussian 
signaling is optimal for i.i.d. Rayleigh fading MIMO channels with perfect CSI at the receiver See e.g. O 
for the case of more sophisticated fading models. The assumption of perfect CSI available at the receiver is a 
reasonable assumption if coherence time is sufficiently long. However, this assumption becomes unrealistic for 
mobile communications with relatively short coherence time. Thus, it is worth considering the assumption of CSI 
available neither to the transmitter nor to the receiver, while the receiver is assumed to know the statistical model 
of the channel perfectly. In this paper, this assumption is simply referred to as the assumption of no CSI available. 

Marzetta and Hochwald (4l considered i.i.d. Rayleigh block-fading MIMO channels with no CSI, and charac- 
terized a class of capacity-achieving signaling schemes. In block-fading channels, the channel is fixed during one 
fading block and independently changes at the beginning of the next fading block. The assumption of block-fading 
simplifies analyzing the capacity, although it might be an idealized assumption^ See fS], 16J for the capacity of 
time-varying MIMO channels with no CSI. In this paper, we consider block-fading MIMO channels with no CSI. 

The capacity-achieving inputs are not i.i.d. over space or time for block-fading MIMO channels with no CSI H. 
These dependencies over space and time make it difficult to calculate the capacity. In order to circumvent this 
difficulty, three kinds of strategies have been considered in the literature. A first strategy is to obtain a closed form 
for a lower bound on the capacity by considering unitary space-time modulation Q, although the closed form does 
not seem to be able to provide any insight. It is possible to calculate a lower bound of the capacity numerically 
for all signal-to-noise ratios (SNRs) ISJ, [9|, while this task is not necessarily easy in terms of computational 
complexity. 

A second strategy is to consider the high or low SNR limits. This strategy can provide an analytical formula of 
the capacity in return for giving up the capacity result in the moderate SNR regime. Zheng and Tse ifTO l derived a 
high SNR approximation of the capacity, which tolerates an error of o(l) in the high SNR limit. In this paper, their 
high SNR approximation is referred to as the Zheng-Tse (ZT) approximation. Their analytical formula provides a 
useful geometric insight, i.e., the capacity of MIMO channels with no CSI has an interpretation as sphere packing in 

' The assumption of block-fading is valid for time-division multiple-access (TDMA) schemes. 
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the Grassmann manifold, while the capacity for the case of perfect CSI available at the receiver has an interpretation 
in terms of sphere packing in the Euclidean space. 

The power per information bit Et, required for reliable communication is a key performance measure in the low 
SNR regime. Verdu ifTTl proved that the SNR per information bit E\j/Nq, with A^o denoting noise power, required 
for MIMO channels with no CSI achieves the minimum NE^/Nq = In 2 w -1.59 dB in the low SNR hmit, with 
N denoting the number of received antennas. This result provides a fundamental limit in terms of energy efficiency. 
See lfT2ll . |fT3]| for more detailed analysis. 

The last strategy is to analyze the achievable rate of a training-based system, which obviously provides a lower 
bound on the capacity. Results based on the last strategy are less explored than those based on the other two strategies. 
Simple modulation schemes, such as quadrature phase shift keying (QPSK) or quadrature amplitude modulation 
(QAM), are commonly used for training-based systems. Consequently, it is possible to obtain an analytical bound 
that can be easily evaluated for all SNRs. Hassibi and Hochwald |14i| derived an analytical lower bound for the 
achievable rate of a pilot-assisted system, called the Hassibi-Hochwald (HH) bound in this paper. Another advantage 
is that it can provide a useful guideline for designing practical training-based MIMO systems. In fact, it was shown 
in lfT4l that the optimal number of pilot symbols is equal to the number of transmit antennas in terms of their lower 
bound. A weakness is that lower bounds derived by the last strategy might be loose, compared to those derived by 
the first strategy. In this paper, we focus on the last strategy and improve the existing lower bound of the capacity 
based on training. 

Hassibi and Hochwald lfT4ll used a method for lower-bounding the achievable rate of a pilot-assisted system, 
developed by Medard in |i5|. As shown in lfT6l . using this method requires the assumption of one-shot channel 
estimation, under which the decoder regards the channel estimates provided by the channel estimator as the true 
ones, in other words, the decoded data symbols are not re-utilized for refining the channel estimates. Thus, a lower 
bound based on training-based systems should improve by refining the channel estimates with the decoded data 
symbols. 

We follow a successive decoding (SD) strategy considered in lfT6ll - lfT8ll . in which the data symbols decoded in 
the preceding stages are utilized for refining the channel estimates. Padmanabhan et al. ifTsll calculated lower and 
upper bounds for the achievable rate of an SD receiver by numerical simulations. However, they did not provide 
any analytical results for the achievable rate. The goal of this paper is to derive an analytical bound based on the 
SD strategy. 

The channel estimator for the optimal SD receiver is nonlinear in general. Consequently, the distribution of the 
channel estimates becomes non-Gaussian. This non-Gaussianity makes it difficult to calculate the achievable rate of 
the optimal SD receiver In order to circumvent this difficulty, we consider a lower bound based on linear minimum 
mean-squared error (LMMSE) channel estimation. In channel estimation, pilot signals transmitted by using a fraction 
of resources are utilized for the initial channel estimation. In this paper, a combination of the conventional pilot- 
assisted scheme and a biased signaling scheme | TW\ is investigated. In the biased signaling scheme, a probabilistic 
bias of transmitted symbols is used for the initial channel estimation, while time-division multiplexed pilot symbols 
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are utilized in the pilot-assisted scheme. We optimize the pilot overhead on the basis of the achievable rate for an 
SD receiver based on LMMSE channel estimation. 

In order to obtain analytical results, we take the large-system limit, in which the number of transmit antennas, the 
number of receive antennas, and coherence time tend to infinity while their ratios are kept constant. The large-system 
limit has been extensively considered in the analysis of code-division multiple-access (CDMA) or MIMO systems 
with perfect CSI at the receiver, by using random matrix theory ||20l - ||23l and the replica method Il24ll - ||30| . The 
advantage of taking the large-system limit is that several performance measures, such as mutual information and 
signal-interference-plus-noise ratio (SINR), are expected to be self-averaging, i.e., they converge in probability to 
deterministic values in the large-system limit. This self-averaging property allows us to obtain analytical results. 

We use the replica method to evaluate the achievable rate in the large-system limit. The replica method was 
originally developed in statistical physics OTl . See f32l - lf34l for the details of the replica method. Recently, it has 
been recognized that the replica method is useful for analyzing nonlinear receivers Il24l - ll30l . A weakness of the 
replica method is that it is based on several non-rigorous assumptions in the present time. See ||35l . ||36l for a 
recent remarkable progress with respect to the replica method. 

This paper is organized as follows: A Rayleigh block-fading MIMO channel is introduced in Section |lll The 
achievable rate of an SD receiver based on LMMSE channel estimation is formulated in Section |lll] The main 
results of this paper are presented in Section HV] The obtained analytical bound is compared to existing results in 
Section |V] We conclude this paper in Section [Vll The derivation of the main results is summarized in appendices. 

A. Notation 

For a complex number z € C, throughout this paper, j, 5R[z], ^[z], and z* denote the imaginary unit, the 
real and imaginary parts of z, and the complex conjugate of z, respectively. For a complex matrix A, A^, 
, Tr{A), and det A represent the transpose, the conjugate transpose, the trace, and the determinant of A, 
respectively. The vector 1„ denotes the n-dimensional vector whose elements are all one. The nxn identity matrix 
is denoted by 7„. The operator (g) denotes the Kronecker product between two matrices. The matrix diag(ai, . . . , a„) 
represents the diagonal matrix with as the ith diagonal element. Ai^ denotes the set of all positive definite nxn 
Hermitian matrices, logx, \nx, S{-), and Sa.b denote logj x, log^ x, the Dirac delta function, and the Kronecker delta, 
respectively. For random variables X, Y, and Z, I{X; Y\Z) denotes the conditional mutual information between X 
and Y given Z with the logarithm to base 2. For a complex random vector x and a random variable Y, cov[a;|F] 
represents the covariance matrix of x given Y. CJ\f{m, S) denotes a proper complex Gaussian distribution with 
mean m and a covariance matrix S ll37l . For covariance matrices S and S, /^^(SIIS) represents the KuUback- 
Leibler divergence with the logarithm to base a between CJ\f{0, S) and CJ\f{0, S). 

As notational convenience for subsets of the natural numbers N, we use [a, b) = {i E N : a < i < b} for integers 
a and b{> a). The other sets [a, b], (a, b), and so on are defined in the same manner. The set J^\{j} = {j' G J' : 
j' j} denotes the set obtained by eliminating the element j from J'. When J' equals the set of all indices {j}, 
J^\{j} is simply written as \j. For scalars {vi}, f[a,6) denotes the column vector v = {va, ■ ■ ■ , while V[a.b) 
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represents the row vector ■U[ci.f,) — {va, ■ ■ ■ ,Vb-i)- For column vectors {a^}, similarly, ^[a,fc) denotes the matrix 
^[a,b) — (oa, • • • , dfc-i)- We use symbols with tildes and hats to represent random variables for postulated (or 
virtual) channels and estimates of random variables, respectively. Underlined symbols are used to represent random 
variables for decoupled channels. 

II. Channel Model 

A. MIMO Channel 

A narrowband MIMO system with M transmit antennas and receive antennas is considered. We assume 
block-fading with coherence time Tc, i.e., the channel matrix H e c^^xa/ jg j^gp^ constant during one fading block 
consisting of Tc symbol periods, and at the beginning of the next fading block the channel matrix is independently 
sampled from a distribution. The received vector y^ £ in the tth symbol period within a fading block is given 
by 

Vt^ -^Hxt+nt, t = l,...,Tc, (1) 
VM 

where Xt = {xi^t, ■ ■ ■ , XM,t)'^ and rit ^ CJ\f{0, NqIn) denote the transmitted vector in the tth symbol period 
and an additive white Gaussian noise (AWGN) vector with a covariance matrix NqI^^, respectively. The MIMO 
channel ([TJ can be represented in matrix form as 

Y = -^HX + TV, (2) 



with Y = (yi, . . . X = (a?i, . . .,xtS), and TV = (ni, . . . ,nTj- 

For the simplicity of analysis, we assume i.i.d. Rayleigh fading MIMO channels, i.e., the channel matrix H has 
mutually independent entries, and each entry /i„_m = {H)n.m is drawn from the circularly symmetric complex 
Gaussian distribution CN{Q, 1) with unit variance. Note that the assumption of i.i.d. Rayleigh fading might be an 
idealized assumption since there can be correlations between the elements of the channel matrix in practice. 

We impose a power constraint 

M Tc 

T7^EE^0^™.*n<^. (3) 

m=l t=l 

for P > 0. Marzetta and Hochwald H proved that the capacity does not decrease even if the power constraint is 
strengthened to a power constraint on each transmitted symbol, 

E [\XmA^] < P. (4) 

The former power constraint (O allows us to use power allocation over space and time, whereas the latter power 
constraint (|4|i does not. In this paper, we only consider the latter power constraint (|4|, which simplifies the analysis. 

B. Training-Based Transmission 

We assume that neither the transmitter nor the receiver has CSI. More precisely, only the statistical properties 
of the MIMO channel ([T]) are assumed to be known to the receiver The previous works ifTOl . lfT4ll showed that 
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pilot-assisted channel estimation can achieve the capacity in the leading order of SNR in the high SNR regime, i.e., 
the full spatial multiplexing gain, while the obtained lower bounds are loose in the low-to-moderate SNR regime. 
Channel estimation based on pilot information is also considered in this paper. The main difference between previous 
works and this paper appears in the receiver structure. We consider joint channel and data estimation based on SD, 
whereas in previous works data symbols decoded successfully were not utilized for refining channel estimates. 

One fading block is decomposed into the training phase Trt^ = {l,...,Ttr} and the communication phase 
Cj-tr+i = {Ttr + 1, . . . , Tc}, which consist of the first Ttr symbol periods and of the remaining {Tf. — Tti ) symbol 
periods, respectively. The transmitter sends pilot symbol vectors in the training phase, and transmits data symbol 
vectors in the communication phase. Therefore, the transmitted vector Xt is assumed to be known to the receiver 
for t E Txtr- For simplicity, we assume that the pilot symbol matrix ^Tt, — ■ ■ • i^Ti) G C*^^-^" has zero- 
mean i.i.d. entries with i.i.d. real and imaginary parts. Furthermore, we assume that each pilot symbol satisfies 
E[|a::„i_tp] = P for t E Tt^^, since the accuracy of channel estimation should improve as the power of pilot 
symbols increases. The transmission of i.i.d. data symbols can achieve the capacity of the MIMO channel ([T]i with 
perfect CSI at the receiver. If accurate channel estimates are obtained by joint channel and data estimation, thus, 
i.i.d. signaling should be a reasonable option for training-based transmissions. We assume that the data symbols 
{xm,t ■ t G CTt,.+i} are i.i.d. random variables with i.i.d. real and imaginary parts for all m and t E Cti^+i- Note 
that zero-mean is not assumed for the data symbols. Under this assumption, the achievable rate is monotonically 
increasing with the power of each data symbol. We hereafter let E[|xm jp] = P. 

In this paper, we consider a biased signahng scheme, in which the mean E[a;„i_t] = 6„i^t of the data symbol for 
t E Cxtr+i is biased while the long-term average (Tc — Ttr)"^ Y^JlLxt +i tends to zero as Tc — > oo. In order to 
apply the replica method, we assume that {3fJ[6',„ (1, 3[6'm.t] : for all m, t} are independently drawn from a zero- 
mean hyperprior probability density function (pdfj^ p{6) with variance ag /2. The transmitter informs the receiver 
in advance about the bias matrix = (O, 0Ttr+i, ■ ■ ■ , ^tJ E C*^^"^% with 6t — (6*1,4, • ■ • , (^M,t)'^- In other words, 
is assumed to be known to the receiver The biased signaling can reduce the overhead for training |fT9l , compared 
to the conventional pilot-assisted schemes. We present two examples of biased signaling: biased QPSK and biased 
Gaussian signaling. See 1381 . ll39l for implementations of biased QPSK. 

Example 1 (Biased QPSK). For ^[xm.t],'^[xm.t] € {±-\/-P/2}, the prior pmf of Xm,t for biased QPSK is given 
by 

P{Xm,t\^rn,t) = ^ ^ • (5) 

The non-negativity of probability restricts the domain of the hyperprior pdf p{9) to 5i[0m,t] G [— \fpjlt\ 
and 9[ftm,i] G [— \/P/2, \J P/2]. It is straightforward to check that ^[xm,t\Om.t\ = 9m^t ond ^[\xm,t\'^\dm.t\ = P- 

2 When 6m,t is discrete, p{0) denotes a probability mass function (pmf). 
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Fig. 1. Successive decoding. 



Example 2 (Biased Gaussian Signaling). The prior pdf of x„i.t G C for biased Gaussian signaling is given by 

p{x^,t\Om.t) ^ l, e''^-'Ci-' . (6) 

7r(P - \OrnA ) 

Note that the domain of the hyperprior pdf p{6) is restricted to < a/P, due to the positivity of variance. 

The main results presented in this paper hold for a general prior of Xm.t with finite moments. The performance 
for the biased Gaussian signaling corresponds to a performance bound for multilevel modulation with trellis 
shaping ll40l. BTI. 

III. Receivers 

A. Successive Decoding 

We consider an SD receiver ifTTll . ifTSI (See Fig. [U. The data symbol vectors {xt} are decoded in the order 
t = Ttr + 1, • • ■ , Pc- In stage t, the matrix X(^Xtr.t) = {^Ttr+ij • • ■ i ^Ct-i) G ^Mxit-Ttr-i) contains the data symbol 
vectors decoded in the preceding stages. Stage t consists of M substages, in which the elements {xm.t} are decoded 
in the order to = 1, ... , M. In substage to within stage t, the vector a;[i f = (^^i,*, • ■ • , Xm-i.t)'^ € C™^^ consists 
of the data symbols decoded in the preceding substages. The channel estimator utilizes the data symbols X^Xtr,t) 
decoded in the preceding stages, along with the received matrix Y\f G C^^'^'=^^-' and the pilot information 
{Xj-^^ , ©(J j.^]}, in which the matrices Y\f and 0(f,Tc] ^ C*^^x(^c-t) ^j-g obtained by eliminating the tth column 
vector from the received matrix Y and the first t column vectors from the bias matrix 0, respectively. We write the 
information used for channel estimation in stage t as It = {X\t,Y\t}, with X\t = {Xj-r^^^, X (^j'tr,t),®{t,T^]) G 
(j-"A/x(Tc-i) substage to within stage t, the detector with successive interference cancellation (SIC) uses the data 
symbols a;[i,m),t decoded in the preceding substages and the bias vector 6[m,M],t = (^m,ti ■ ■ ■ : dM,t)'^ to subtract 
inter-stream interference from the received vector and then perform multiuser detection (MUD). 

Let us define the constrained capacity of the MIMO system based on the pilot information {Xj-r^^ , 0} as the 
conditional mutual information per symbol period between all data symbol vectors {xt : t E Ct,,+i} and the 
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received matrix Y conditioned on the pilot symbol matrix Xjj,^ and the bias matrix 

C = ^I{{xt : t e CT,,+i};Y\Xrr^^,&). (7) 

c 

It is straightforward to confirm that the optimal SD receiver can achieve the constrained capacity (|7]l. Applying the 
chain rule for mutual information to (|7| repeatedly ll42l . we obtain 



1 



^ t=Tt,- + l 
1 

^ i{xuyt\'^u&{T,,,t),Ot) 

" t=Tt, + l 

t=Ttr + l m=l 

with 0(r^^ J) = (0Tt,+ij • ■ • I ^t-i)- In the derivation of the second equality, we have used the fact that Xt and 
Y\t are independent of each other, due to the i.i.d. assumption of the data symbols. In the last expression, we 
have omitted conditioning with respect to 0(Ttr.t) and 0[i^m).t — {(^i,t, ■ ■ ■ , 9m~i.t)^ , which are not utilized by the 
receiver in substage m within stage t since they are the parameters of the known data symbols Xj-^-j^, and j. 
For notational simplicity, this omission is applied throughout this paper. Note that @^J^_^ t) and 9[i,m),t affect the 
achievable rate ([S]). Expression (HJ implies that the SD scheme results in no loss of information if the detector with 
SIC can achieve the mutual information /(a;m,t; yt|2^t, a'fi.m),*, ^[m,M],t) in substage m within stage t. 

It is difficult to evaluate the mutual information I{xm.uVt\^tTXYi„i)^n(^[m.M\,t) exactly. Instead, we derive a 
lower bound based on LMMSE channel estimation. We first introduce the optimal channel estimator and then define 
the LMMSE channel estimator 

B. Channel Estimator 

1 ) Optimal Channel Estimator: We focus on stage t in this section. The optimal channel estimator estimates the 

channel matrix H based on the information It, and sends the joint posterior pdf 

p{Y\,\H,X\MH) 
p{H\It) = r (V \rr V \ ^mwH" 

to the detector with SIC. In (|9]l, the pdf p{Y\t\H , X\t) is decomposed into the product of pdfs p{Y\t\H , X\t) = 
Y]^iliPiyt'\H,Xt')Y\^r^^^^p{yAH,et,), given by 

piyt,\H,9t')= j p{yt,\H,xt,)p{xt,\et.)dxt,, (10) 

where p{yti\H, Xf ) represents the MIMO channel ([T]). Note that the joint posterior pdf (|9]) is decomposed into the 
product Y[n=i Pi^n\It) of the marginal posterior pdfs, with h„ G ^ixM (jgnoting the ?ith row vector of H, due 
to the assumption of i.i.d. fading. 

The optimal channel estimator is nonlinear in general, which makes it difficult to analyze detectors with SIC, while 
it is possible to evaluate the performance of the optimal channel estimator. In order to circumvent this difficulty, 
we reduce the optimal channel estimator to an LMMSE channel estimator by considering a virtual MIMO channel. 
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2) LMMSE Channel Estimator: We use Medai'd's method ITS) to replace the MIMO channels ([Hi for t' = 
t + 1, . . . , Tc by virtual MIMO channels 

Vt' ^ -T^HOt' +wt' +nt,, (11) 
V Al 



where Wt' G denotes a circularly symmetric complex Gaussian random vector with the covariance matrix 
(P - (7f,)lN, with CTj, = il/^i J2m=i l^m,t'P- The virtual MIMO channel ([Il]i is obtained by extracting the term 
M~^^^H{xt' — 9t') from the first term of the right-hand side in the original MIMO channel ([TJ and then replacing 
it by the AWGN term Wf with the covariance matrix cov[AI~^^^H{xt' — Ot')\Ot']. This replacement implies that 
information about the channel matrix included in H{xt' — Of) is discarded. Thus, channel estimation based on the 
virtual MIMO channel (fTTI) should be inferior to that based on the original MIMO channel ([T]), in other words, the 
mutual information I{xm,t',yt\^t,X[i m).t,(^[m.M].t) should be bounded from below by 

Ii^m.,t;yt\^t,X[i,Tn)^t,0lrn,M],t) > I {Xm,U V t\1-t, ,t, 9[m.M].t) , (12) 

which denotes the constrained capacity of the MIMO channel ([T]| in symbol period t with side information It = 
{^\t,y\t}, X[i,m),u and 0[m,M],t, in which Y\t = (y^, . . . ,yt_-^,yt^^, . . . , yyj contains the received vectors 
of the virtual MIMO channel (fTTl i in the last (Tc — t) elements while the first {t — 1) elements of Y\t are the same 
as those of the original one Y\t- From ([T]| and (fTTI) . the matrix Y\t is explicitly given by 

Y\t = ^JfX\, + (0,W(4,T.]) + iVv, (13) 

with W(f,Tc] = {wt+i, . . ■,wt,) e C^^(^<=-*). In the matrix N\t e C^>^('^<=-i) is obtained by eliminating 
the tth column vector from the noise matrix TV. 

Let us consider channel estimation based on the information The optimal channel estimator for this case 
constructs the joint posterior pdf p{H\It) and feeds it to the detector with SIC. The joint posterior pdf of H given 
it is defined by Q in which the pdf ( fTol ) for <' = < + 1 , . . . , Tc is replaced by 

p{yt,\H,et>) = Jp{yt,\H,0t,,wt,)p{'^t'\0t')dwt,, (14) 

where p{y^\H,0t' ,Wt') represents the virtual MIMO channel (fTTI ). A straightforward calculation indicates that 
the joint posterior pdf p{H\Xt) is a proper complex Gaussian pdf with mean Ht € £Nxm covariance 
cov[{hi, . . . , hN)'^\it] = «) Ht, given by 



H 



t-i 

11, or/:' . — , II, ,H.: 

(15) 




(16) 

The posterior mean Ht coincides with the LMMSE estimator of H based on the received matrix Y\t and the known 
information X\^f Furthermore, St is equal to the error covariance matrix of the LMMSE estimator /it i G C^^^^ 
for the first row vector of H. Thus, we refer to the optimal channel estimator for the virtual MIMO channel ( fTTI ) 
as the LMMSE channel estimator. 
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Note that the linear filter given by ((TSj provides the LMMSE estimates of H for the original MIMO channel ([T]). 
One should not confuse the LMMSE channel estimator for the virtual MIMO channel ( fTTT ) with that for the original 
MIMO channel ([T]). The former, which is considered in this paper, is the optimal channel estimator for the virtual 
MIMO channel (fTTT) . while the latter is a suboptimal channel estimator for the original MIMO channel ([U. 

C. Detector 

We focus on substage m within stage t and define the optimal detector with SIC, which achieves the lower 
bound ( fT2] i. The optimal detector with SIC feeds to the associated decoder the posterior pd^ of Xm.t based on 
the knowledge about the received vector y^, the data symbols a3[i,„i)_t decoded in the preceding substages, the 
bias 0[,n,M],t for the unknown data symbols X[,n M],t = {xm,t, ■ ■ ■ ,XM,t)'^, and the joint posterior pdf p{H\It) 
provided by the LMMSE channel estimator, given by 

. I ^ ^ I Piyt\xt,it)pix[„i,M].t\0[„i,M],t)dxi^^^M],t 

p{Xm,t\yt,^t,X[i,„i),t,9l7n,M],t) = n , I TV^ ' ~[7i ^^^^ 

J P[yt\Xt,^t)P[X[m,M],t\(>[7n,M],t)aX[,n,M],t 

with a;(„^Af],t = (-^m+i.t, ■ • ■,XM,tV S C*^"™. In ([17|, the pdf p{yt\xt,it) is given by 

p{yt\xt,it) = J p{yt\H,xt)p[H\it)dH, (18) 

where p{yf\H,Xt) represents the MIMO channel The use of SIC appears in the pdf (HH), which IS a proper 
complex Gaussian pdf with mean M-^/'^Hxt and covariance {Nq+M ^x^atXt)lN ■ Expression (VH implies that 
the optimal detector with SIC subtracts the known inter-stream interference Af~^/^iJ(a;[i j-^, 0, ^(m.A/j.t"'^)^ from 
the received vector y^, with 6i^m,M],t = (^m+i,t, ■ ■ ■ , ^A/,t)^> and then mitigates residual inter-stream interference 
by performing the optimal nonlinear MUD. 

Let Xm,t € C denote a random variable representing the marginal posterior pdf ( fTTI i. Since the lower bound ( fT2b 
is equal to the mutual information I{xm,t',S:m.t\^t,X[i^m).t,d[m,M],t)^ the achievable rate ^ of the optimal SD 
receiver is bounded from below by 

^ T, M 

^-f X! I{Xm,uim,t\it^^[l,m),t,d[mMlt)^ (19) 

t=Ttr + l m=l 

which is given via the equivalent channel between the data symbol Xm,t and the associated decoder 

p{Xm^t\Xra,t,1-t:X[l 

= / P{x (^[7n,M],t)p{yt\^t,It)p{X(^,n,M]jW(m,M],t)dx,^jn,M],tdyf (20) 

The goal of this paper is to optimize the length of training phase Ttr, the prior pdf of the data symbols, and the 
hyperprior pdf of the bias in terms of the lower bound ( fT9l ) based on the LMMSE channel estimation. 

^The marginal posterior pdf ii7\ is replaced by the posterior pmf of x^^t if ^m,t is a discrete random variable. 
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IV. Main Results 

A. Large-System Analysis 

We focus on substage m within stage t in the SD receiver. In order to calculate the conditional mutual information 
^(a^m,*; im,t|2^t, „j) t, 0[„jjv,/].t)' we have to evaluate the distribution of the equivalent channel ( |20] |. which is 
a probability distribution on the space of distributions and depends on the omitted variables 0(j.^^_^ and 9[i^m),t 
implicitly through the posterior pdf p{H\It) and ccji ,„) j. This evaluation is quite difficult in general for finite-sized 
systems. A key assumption of circumventing this difficulty is the assumption of the large-system limit in which 
M, N, Tc, Ttr, t, and m tend to infinity while thek ratios a = M/N, P ^ M/T^, tq = Ttr/Tc, r = t/T^, and 
^ — m/M are kept constant. The self-averaging property for the equivalent channel (|20| | is expected to hold in 
the large-system limit: The distribution of the equivalent channel (|20| | converges to a Dirac measure on the space 
of distributions in the large-system limit. Under the assumption of the self-averaging property, the replica method 
allows us to analyze the equivalent channel ( |20l i in the large-system limit. 

The self-averaging properties are classified into those for extensive quantities and those for non-extensive quan- 
tities. The former quantities are proportional to the size of systems, while the latter quantities are not. The self- 
averaging property for extensive quantities, such as sum capacity and the so-called free-energy in statistical physics, 
has been rigorously justified for linear systems [21] and general systems ll43l . Il44l . It might be possible to prove 
the self-averaging property for the lower bound (fT9l l by using the method developed in ll43l . ll44l . However, we 
need the self-averaging property for each equivalent channel (|20] |. which is a non-extensive quantity. 

The self-averaging property for equivalent channels has been rigorously proved in the case of linear receivers by 
using random matrix theory ll20l . Il22l . while its justification is still open for nonlinear receivers. Note that the self- 
averaging property for equivalent channels would not hold if the systems had complicated structures corresponding 
to replica-symmetry breaking (RSB) ll33l . ||451 . Fortunately, Nishimori's rigorous result B6l suggests that the system 
considered in this paper should have a simple structure corresponding to replica symmetry (RS). A recent rigorous 
study |47| also supports the RS assumption. Thus, the self-averaging property is expected to hold. See lfT6l for 
an intuitive interpretation of the RS assumption. The formal definition of the RS assumption will be presented in 
Appendices |B] and [C] 

We also need the self-averaging property for each element of the error covariance matrix ( fTSI l. along with that 
for the equivalent channel (l20l i. Note that the error covariance matrix (fTSI l is a random matrix depending on X\t 
explicitly and on 0(7-^^ implicitly through the data symbol vectors decoded in the preceding stages. See ll48l for 
the self-averaging property of each diagonal element for random covariance matrices. 

Assumption 1. Each element of the error covariance matrix ( 1761 ) for the LMMSE channel estimation converges in 
probability to a deterministic value in the large-system limit, i.e., 

(£,^{t) for m = m' 
p{t) for fh < m' (21) 
P*{t) for rh > fh'. 
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The error covariance matrix ( fT6l ) does not depend on the number of receive antennas N or the current substage m. 
Thus, the limits jTH do not depend on a or /i, while they may depend on f3, tq, and r. Assumption [T] has been 
rigorously proved for the unbiased case dm,t = in 



Assumption 2. The equivalent channel ( |2Q| ) is self-averaging with respect to Y\t and ( I2QD converges 

in law to a conditional pdf of x^ t given j, X\i, and df, which does not depend on Y\i or X\^i „^-^ ^, in the 
large-system limit. 

The equivalent channel ( l20b is also expected to be self-averaging with respect to the other random variables. 
However, Assumption |2] is sufficient for using the replica method. We postulate Assumptions [T| and |2] since their 
justification is beyond the scope of this paper 

The lower bound ( fT9] l is given as a double integral of the constrained capacity for AWGN channels. We first 
derive the AWGN channels in a heuristic manner. The heuristic derivation described below provides an intuitive 
interpretation of the AWGN channels, although the formal derivation is based on the replica method. In the current 
stage t — tTc, with tq < r < 1, we consider fading channels with time diversity for channel estimation, 

y t, = —?=hn,mXm,t' + 2£„ t' i all t' < t, (22) 



y = -^hn,mOm,t' + w„ f , for all t' > t, (23) 

' VM 

with f., ^ CJ\f{0, (T^j.) for t' < t and with w„ ^ CJV{0, cr^) for t' > t. The fading channels are obtained by 
extracting the first terms in (l22l l and (|23T l from the original and virtual MIMO channels ([TJ and ( fTTT i. and then 
by approximating the remaining terms by circularly symmetric complex Gaussian random vectors with covariance 
ct^j-Im and a'^Iw, respectively. We apply maximal-ratio combining (MRC) to (|22] | and (|23] |. 



^ \ ^ * 



(24) 




(26) 



1 

/ .f, E C,t'y„,,,- (25) 

Taking the large-system limit, due to the weak law of large numbers, we obtain 

where Wtr and Wc are mutually independent circularly symmetric complex Gaussian random variables with variances 
(T^j. and CTp, respectively. The minimum mean-squared error (MMSE) estimate of hn.m for the channel ( |26] ) is given 
by mi 

where the mean-squared error (MSE) ^^(r) for the MMSE estimate dZTl ) is explicitly given by 

9, . f rP (l-r)cr2V^ 

V <P ^ip J 
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It is well known that the MMSE estimate „ and the estimation error A/i„ „j — hn.m ~ fkn m ^re uncorrelated 
circularly symmetric complex Gaussian random variables with variances (1 — ^^(t)) and ^^(r), respectively. 

We next consider fading channels with spatial diversity for data estimation in the current substage m — fiM for 
< M < 1, 

y„.f = {L.ma;m,t + A/i„„a;„,t| +u;„ j, n = l,...,N, (29) 



where w„ j e C denotes a circularly symmetric complex Gaussian random variable with variance (T^(t, /i). Applying 
the MRC to {y^ we obtain 



1 

z = 



Taking the large-system limit gives the AWGN channel 



E^1™E„,*- (30) 



^^x„,,t + w, (31) 

a 

with w ^ CA/'(0, <t^(t, /i)). The MMSE estimate x„ j of Xm^t for the AWGN channel OTI ) is given as the mean 
^m,t = J Xra,tp{xm,t\z,Sm,t)dxm,t with rcspcct to the postcrior pdf 

p(a;™,t|z,6'™,t) = -p— — ^ , (32) 

where p{z\xm.t) represents the AWGN channel ( |3TI ). The MSE for the MMSE estimate i„ ^ given 6„i.t is defined 
as 

MSE((t2,0™,O -IE[|x™,t-i„^,|2|0™,t] . (33) 

We have not so far specified the variances cr^^, ct^, and a'^{T,ii). The constrained capacity of the AWGN 
channel (|3TI ) provides a lower bound of the constrained capacity (|7| by determining the three variances as solutions 
to fixed-point equations. 

Proposition 1. Suppose that Assumption [7] Assumption |2] and the RS assumption hold. Then, the constrained 
capacity Q per transmit antenna is bounded from below by 

C ^ 

/re[ro,ll "'^£[0,1] 

in the large-system limit, in which the mutual information lixm.t] z\9m^t) is equal to the constrained capacity of 
the AWGN channel 071 ). In evaluating I{xm,t', z\9m,,t), Wti^ ^c} given as the solution to the coupled fixed-point 
equations 

al^No + Peir), (35) 
^No + {P-a^e)+ ^e^'M, (36) 
where (r) is given by l[28\l . Furthermore, (r, /i) is given as a solution to the fixed-point equation 

= 7V„ + ^^(r) + (1 - - e'(r))IE [MSE(a2, , (37) 



)dTdfi, (34) 
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where MSEi{a^ ,9m,t) is given by ( 1331 ). If the fixed-point equation \37^ has multiple solutions, one should choose 
the solution minimizing the following quantity 



(1 - ^l)I{x,n.tlZ\em.t) + — 

a 



^2(iVolk2) + ^log2e 



(38) 



Derivation of Proposition\J} See Appendix lAl ■ 
Note that the last terms in the coupled fixed-point equations ( l35T l and ( |36] | depend on cr^^ and cr^ through ( |28] ). 
Equation (l37l ) for given £,'^{t) provides a fixed-point equation with respect to a^. The second and last terms in the 
right-hand side of dJTl l correspond to contributions from channel estimation errors and inter-stream interference, 
respectively. The integrand in ( |34] l depends on the variables r and /i through the SNR P(l — f^(T))/(acr^(T, /i)). 

The existence of multiple solutions in (|37| i relates to the so-called phase transition in statistical physics. See |24|, 
ll27l for an interpretation in the context of communications. Numerical evaluation of dJTb for QPSK modulation 
implies that multiple solutions do not appear when a < 1. In the high SNR regime there is no point to use 
transmit antennas more than received antennas or half the coherence time. In fact, Zheng and Tse IfTOl proved 
that the full spatial multiplexing gain of the MIMO channel with no CSI is given by M{1 — Af/Tc), with M = 
min{M, iV, [Tc/2J}, which is achieved by using min{A^, [rc/2j} transmit antennas out of M transmit antennas 
if M > iV or M > lTc/2\ . Thus, we consider a < 1 and ^ < 1/2 in the high SNR regime. 



B. Optimization 

The next goal is to optimize the lower bound (|34] | with respect to tq, the hyperprior pdf of 9m,t, and the prior 
pdf of Xm,t- We first notice that the lower bound ( |34| | is monotonically nonincreasing with respect to tq since the 
integrand is non-negative and does not depend on tq. Thus, the lower bound (|34] | is maximized for tq — ^ 0. Note 
that the limit tq — > does not necessarily indicate no pilot symbols, since we have taken the limit tq — > after 
the large-system limit. In other words, the effect of pilot symbols is neglected in Proposition [T] if Tti is sublinear 
in Tc, i.e., Ttr = o{T,). 

Next, we maximize the lower bound ( |34] | with respect to the hyperprior pdf of 9m, t- For a fixed hyperprior pdf, 
the SNR P(l — ^■^(r))/(Q!cr2(T, /i)) improves as the variance ag grows, since the increase of ag results in reductions 
of the channel estimator error (r) and the inter-stream interference given by the last term in the right-hand side of 
(IJTT i. However, increasing ag reduces the mutual information I{xm,t', z\9m,t) for a fixed SNR, due to the reduction 
of payload. Interestingly, numerical results presented in Section |V] show that the lower bound ( l34b is maximized as 
cTg — > for a fixed hyperprior pdf of 6m,t- This indicates that the lower bound ( |34] | is maximized when 6„i,t — 
with probability one. 

The arguments described above indicate that negligible pilot information, more precisely, the limits To,crg — !• 
are best in the large-system limit, while tq = /3 is best in terms of the HH bound lfT4l . Thus, we can conclude that 
the SD scheme can reduce the overhead for training significantly. It is worth noting that using a capacity-achieving 
error-correcting code is assumed in our analysis. We conjecture that if some practical coding is used finite tq or 
(Tg are required for getting accurate channel estimates in the initial stage. See Il50l for the case of practical coding. 
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Finally, we optimize the lower bound ( [34l i with respect to the prior pdf of Xm,t- This optimization problem is 
nonlinear since the prior pdf of x^.t depends on through the last term in the right-hand side of the fixed-point 
equation (|37] |. Instead of solving the nonhnear optimization problem exactly, we consider the biased Gaussian 
signaling Xm,t ^ CAf{6m.t,P~ |^m,tP) as a suboptimal solution. This choice of the prior pdf should be reasonable 
since Gaussian signaling is optimal if accurate channel estimates can be obtained. In this case. Proposition [T]reduces 
to the following corollary. 

Corollary 1. Suppose that Assumption \l] Assumption^ and the RS assumption hold. If x„L,t ^ CAf{Om,t, P — 
|^m,tP), then the constrained capacity per transmit antenna is bounded from below by Cg, given by 
C 



, r > Cg = 



re[ro,l] "'^£[04] 



drdp, (39) 



in the large-system limit. In evaluating the integrand in ( Ii9l ), ^^(t) is given by ( 12^1 ), defined via ( li5D and 
Furthermore, i7^(t, /i) is given as the unique solution to the fixed-point equation 



(40) 



[{l-e{T)){P-\6m,t?)+<^'^\ 

Proof of Corollary\I] It is straightforward to confirm that the lower bound (O and the fixed-point equation (|37] | 
reduce to ( [39l ) and ( |40l ), respectively. Thus, we only prove the uniqueness of the solution to the fixed-point 
equation (|40] i. The right-hand side of ( |40] i is a concave function of cr^, which intersects with a straight line passing 
the origin with slope 1 at two points. Since the concave function passes the point (0, A^o +£,'^{t)), which is above 
the origin, one intersection must be in ct^ < 0. Thus, the fixed-point equation ( |40] | has the unique solution in the 
region ct^ > 0. ■ 
We believe that the biased Gaussian signaling maximizes the quantity ( l38b . following the argument in ifSTI . If 
^^(t) = and cr^ = iVo were satisfied for all r and p, the biased Gaussian signaling would maximize the lower 
bound ( |34] |. However, ^^(r) is bounded from below by a positive value for t < f3. This implies the suboptimality 
of the i.i.d. Gaussian signaling. 

C. High SNR Regime 

In the high SNR limit A^o 0, the lower bound ( l39l ) is shown to achieve the full spatial multiplexing gain when 

Ttr- < M. 

Proposition 2. Suppose that Assumption^ Assumption^ and the RS assumption hold. For a < 1 and tq < (3 < 1/2, 
the lower bound ( 1591 ) with biased Gaussian signaling Xm,t ^ CAf(Om.t, P ^ l^m.tP) achieves the full spatial 
multiplexing gain in the high SNR limit Nq — > 0, i.e., 

liminf A - = 1-/3. (41) 

No^O logiP/No) 

Proof of Proposition ^ We first prove that the solution cr^j. to the coupled fixed-point equations ( [35] l and 
converges to zero in the high SNR limit for t > [3. The proof is by contradiction. Suppose that CTj^ is strictly 
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positive in the high SNR limit. Dividing both sides of ( [35] l by cr^j. and taking the high SNR limit, we have 



Palal + Ti 

Rearranging (|42]) . we obtain 



^ Palal + rPal + (1 - T)alal ' ^^^^ 



2 _ {T-P)Pat 

fial + (1 - T)al 

However, ( |43T l is negative, due to r > /3, which is a contradiction. Thus, the solution a^^ must converge to zero in 
the high SNR limit for t > j3. This result implies that the MSE ( |28] l also converges to zero in the high SNR limit 
for T > p. 

It is straightforward to show in a similar manner that the solution to the fixed-point equation dJTb is 0{Nq) 
in the high SNR limit for a <1 when the MSE ( |28]) converges to zero. Thus, we have 

lim inf A — - = 1-/3, (44) 

Wo^O log(P/A^o) 

which is equal to the full spatial multiplexing gain for a < 1 and /3 < 1/2. ■ 
The proof of Proposition |2] indicates that in the first M stages the performance of the SD receiver is limited by 
channel estimation errors, rather than inter-stream interference in MUD. This phenomenon is robust in the sense 
that it occurs regardless of the prior of data symbols. 

D. Low SNR Regime 

The power per information bit required for reliable communication is a key performance measure in the 
low SNR regime. Verdii fTT] proved that the capacity Copt of the MIMO channel ([T]) wit no CSI is given by 
Copt = NP/{Nq In 2) + o{No) in the low SNR Hmit iVo ^ oo, or NEb /No > limAr^^oo NP/{NoCopt) = In 2 w 
— 1.59 dB. Since using multiple transmit antennas wastes valuable power in the low SNR regime, the number of 
transmit antennas used should be reduced as Nq increases. One option is to increase M^^ and Nq at the same rate. 
Thus, we consider the limit, in which a, /? and iVo oo while /3/a and s ~ P/(J3No) are kept constant. The 
following proposition provides an upper bound on the normalized SNR NE^/Nq = N P/ {NqC) required for the 
optimal SD receiver, with C denoting the achievable rate ([8]) of the optimal SD receiver 

Proposition 3. Suppose that the optimal SD receiver achieves a rate R/M. Then, the normalized SNR NEi^/Nq 
is bounded from above by 

NE^/No + o{No), (45) 

an 

in the limit where a, /3 — )■ and iVg — >■ oo while 13/ a and s = P/ {/SNq) are kept constant. In f l?5l ). s is implicitly 
given by 



R 



l+(s+^/ 



\og[l + s + ^sA - (l + -^]\og{l + s). (46) 



Proof of Proposition |5} Using the lower bound ( [39] l for Gaussian signaling, we obtain an upper bound 
NE\,/Nq < NP/{MNQCg) = /3s/{acg). Thus, it is sufficient to prove that the maximum of Cg with respect to tq 
and (Tg is given by the right-hand side of ( |46t . 
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We evaluate the solutions to the fixed-point equations ( [35] l. ( [36] l. and dJTj l. It is straightforward to find that (t^^./No, 
(j'^/Nq, and (j'^/Nq tend to 1 in iVo — ^ oo, since ( l28T l and ( l33T l are bounded. This observation implies 



Cg < / log 

J To 



As/ r+(l-T)a,VP 



1 + — 1-^ 



in the limit described in Proposition [3] In the derivation of ( |47] ). we have used Jensen's inequality. The equality holds 
only when |6'„i_tp takes ag with probability one. It is easy to confirm that the integrand in (l47b is monotonically 
decreasing with respect to a^. Thus, the maximum of Cg is achieved at tq = and 0-0=0 and given by 



max Ce = / log 



1 r o 



1 



dT + o{No). (48) 



a 1 + ST 

Calculating the integration in ( |48] |. we find that the first term in the right-hand side of ( |48] | is equal to the right-hand 
side of (|46l). ■ 
In the proof of Proposition [3] we have proved that the lower bound ( |39] | is maximized at tq = and CTq = in 
the low SNR regime. This result implies that negligible pilot information is best in terms of the lower bound ( l39l ) 
in the low SNR regime. 

It is interesting to note that the achievable rate ( |46] | is approximated by i? = /3s^/(2aln2) + 0{s^) as s — > 0, 



which implies that NE^^/Nq < ^2/3 ln2/ai? + o(A^0 7 R) in the low rate regime, i.e., the upper bound ( |45l l diverges 
in i? — > 0. In other words, the minimum of the upper bound (l45T l is achieved at a strictly positive rate, as shown 
in Section IV] We remark that a similar result was reported in |52|. 

The reason why the minimum is achieved at a positive rate is that we have spread power over all time slots. 
It is well known that on-off keying is optimal in the low SNR regime, in other words, that spreading power over 
all time slots results in a waste of valuable power. If on-off keying was used, the normalized SNR required would 
reduce monotonically as the achievable rate decreases, as shown in ll52l . However, on-off keying requires high 
peak-to-average power ratio (PAPR), which is unfavorable in practice. Thus, the minimum of the upper bound (l45l l 
may be interpreted as a practical performance bound in terms of energy efficiency. 

V. Comparison 

The lower bound ( [39] l for the biased Gaussian signaling is compared to two existing bounds in this section. One 
is the HH bound lfT4ll . which corresponds to the achievable rate of receivers based on one-shot channel estimation, 
in which the decoded data symbols are not re-utilized for refining the channel estimates. The other one is the ZT 
approximation, which includes a deviation of o(l) from the capacity at high SNR. In all numerical results, we 
choose To = since the lower bound (|39] | is maximized at tq = 0. In order to investigate the optimal choice of ag, 
we display the lower bound ( l39l l for the biased Gaussian signaling with respect to ag in Fig. [2] The lower bound (l34l i 
for the biased QPSK signaling is also shown in the same figure. We have used p{9) — [S{9 — erg) + d{9 + ae)]/2 
as the distribution of d„i.t, i.e., 9m,t takes zLcrg with equal probability. We find that the lower bound for the biased 
Gaussian signaling is larger than that for the biased QPSK signaling for all ag. Furthermore, both lower bounds 
are monotonically decreasing as ag grows. The latter observation imphes that the lower bounds are maximized 
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Fig. 2. Achievable rate versus erg for P/Nq = 6 dB, a = 1, and tq = 0. 

at (Tg = 0, in other words, negligible pilot information is best in terms of the lower bound (|34] |. Hereinafter, we 
consider the unbiased case ag — 0. 

Figure [3] provides a comparison between the lower bound ( [39] l for unbiased Gaussian signaling and the HH 
bound in the moderate SNR regime. The lower bound (l34b for QPSK modulation is also displayed. The HH bound 
is a lower bound on the capacity for finite-sized systems lfT4l Theorem 3]. We have used a large-system formula 
of the HH bound, which is easily derived in the same manner as in 1211 . There is a significant gap of 1 dB to 
1.8 dB between the lower bound for unbiased Gaussian signaling and the HH bound for all SNRs. Moreover, 
the HH bound is inferior even to the lower bound for QPSK modulation in the case of short coherence time 
(/3 — 0.5). These observations imply that SD receivers can provide a substantial performance gain in the moderate 
SNR regime, compared to receivers based on one-shot channel estimation, since they can reduce overhead for 
training significantly. 

Next, we compare the lower bound for unbiased Gaussian signaling with the ZT approximation ifTOl Corollary 11] 
in the high-SNR regime in Fig. ID The HH bound is also displayed in the same figure. Note that in the ZT 
approximation the large-system limit is taken after the high SNR limit. Thus, the comparison makes sense under 
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Fig. 3. Acliievable rate versus SNR in the moderate SNR regime for a = 1, (t| = 0, and tq = 0. 



the assumption that the large-system limit and the high SNR limit commutes for the ZT approximation. We find 
that the ZT approximation is smaller than the lower bound for unbiased Gaussian signaling in the SNR region of 
below 17.5 dB for j3 — 0.5 or below 20 dB for f3 = 0.1. This implies that the high SNR approximation derived by 
Zheng and Tse [lO] is valid only for quite high SNR. The lower bound for unbiased Gaussian signaling is close to 
the HH bound, rather than the ZT approximation, in the quite high SNR regime, which indicates the suboptimality 
of Gaussian signaling in the quite high SNR regime. 

Finally, we consider the low SNR regime and take the limit described in Proposition |3] in which the power 
per information bit Eh required for reliable communication is a key performance measure. Figure |5] displays the 
upper bound ( |45] | of NEh /Nq required for the SD receiver with unbiased Gaussian signaling as a function of the 
achievable rate per transmit antenna. The normalized SNRs NE^/Nq are also plotted for the SD receiver with 
QPSK modulation and for the HH bound. We find that NE-^ /Nq has a minimum at a positive achievable rate. This 
observation is due to the suboptimality of i.i.d. Gaussian signaling. If perfect CSI was available at the receiver, the 
normalized SNR NE^/No would be monotonically decreasing with the reduction of the achievable rate [H], since 
i.i.d. Gaussian signaling is optimal in that case. For the case of no CSI, however, the normalized SNR NE-^/Nq 
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diverges as the achievable rate tends to zero, since i.i.d. signaling wastes valuable power in the low SNR regime, 
as discussed in Section HV-DI Another observation is that there is a large gap of 1 dB and 1.5 dB between the 
minimal normalized SNRs for the SD receivers and the HH bound, while all bounds are far from the ultimate limit 
NE\^/No « —1.59 dB. This result implies that the SD scheme can significantly improve the HH bound in the low 
SNR regime. 

VI. Conclusions 

We have investigated the achievable rates of SD receivers for the Rayleigh block-fading MIMO channel with 
no CSI. An analytical formula for the achievable rates has been derived in the large-system limit, by using the 
replica method. It has been shown that negligible pilot information is best in terms of the information-theoretical 
achievable rate. From a theoretical point of view, the formula provides the best lower bound for the capacity among 
existing analytical lower bounds that can be easily evaluated for all SNRs, while it is far from the true capacity 
in the low or quite high SNR regimes. From a practical point of view, the analytical lower bound derived in 
this paper can be regarded as a fundamental performance limit for practical training-based systems with QPSK or 
multilevel modulation. We conclude that the SD receiver can reduce overhead for training significantly. Thus, it 



November 22, 2011 



DRAFT 



IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , 2011 



21 



6 
5.5 

5 
4.5 

4 

? 3 
2.5 
2 
1.5 
1 

0.5 




p/a=0.5 



|3/a=0.1 



HH bound ' , 

QPSK / ,' 

Gaussian ' ,' 



/ / 
/ / 
/ 

1 — 

/ 



0.25 0.5 0.75 1 1.25 1.5 

Achievable Rate per Transmit Antenna (bps/Hz) 



1.75 



Fig. 5. Normalized SNR versus achievable rate in tiie low SNR regime for tjg = and tq = 0. 



provides a substantial performance gain, compared to receivers based on one-shot channel estimation, especially in 
the low-to-moderate SNR regime. 

One important future work is to investigate spatially correlated MIMO channels with no CSI. On the one hand, 
spatial correlations cause a reduction of diversity. On the other hand, they make it possible to estimate the channel 
matrix more accurately than without correlations, since one can utilize the knowledge about the correlations for 
channel estimation. Thus, it should be worth investigating impacts of these two effects onto the performance for 
training-based systems. It does not seem to be straightforward to extend the results presented in this paper to the 
case of spatially correlated MIMO channels. 
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Appendix A 
Derivation of Proposition [T] 

A. Sketch 

Let us consider substage m = /lAf within stage t = tT^ in the SD receiver for < r < 1 and < /i < 1. For 
some m e N, we decompose the lower bound (fT9] l into two terms, 

■ m M 

m— 1 m— m+1 

(49) 

and then take the limit in which M, N, T^, Tn, t, ni, and rh tend to infinity while their ratios a = M/N, 
/? = AI/Tc, tq = Ttr/Tc, T — t/Tc, /i — m/M, and /^o = rh/M are kept constant. The second term consists 
of the mutual information for the decoding problem after an extensive number of users have been decoded, while 
the first term contains the mutual information for the problem after a finite number of users have been decoded. 
We will show below that /(xm.t; im.tjlt, a;[i ,„) j, 0[,„jv/] j) for > /^o converges to the integrand in ( [34l ) in the 
large-system limit. The definition of the Riemann integral implies that the sum {TcM~^) X^t^Tt +i Sm=rfi+i 
the second term of ( |49l ) tends to dr f^^ dfi. Taking the limit /iq — > 0, we arrive at Proposition [T] since the first 
term of ( |49] ) tends to zero in /io 0. 

The evaluation of I{xm,t',S:m,t\It,X[i ,n) t,(^[m,M],t) consists of two parts: analysis of the error covariance 
matrix (fT6l) and analysis of the equivalent channel ( l20b . We apply the replica method to both analyses. 

B. Analysis of Channel Estimator 

We evaluate the error covariance matrix ( fT6b for the LMMSE channel estimation in the large-system limit. Since 
the joint posterior pdf p{H\it) is decomposed into 11^=1 P(^"l^t)' without loss of generality, we focus on the 
estimation problem for the first row of denoted by hi. The first row vector i e cixC^c-i) Qf jg given 
by 

y\t.i = + ■^(t,Tc],i) + n\t^i, (50) 

where W(^t,T^],i € C^^'"^=^*-' and n\t,i € C^^'"^""^-' denote the first row vectors of the matrices W^t.T^] and ATy, 
respectively. 

The channel ( fSOb can be regarded as a MIMO channel with the channel matrix Xy known to the receiver. The 
main difference between Xy and zero-mean channel matrices considered in the previous works ||20l - ll22l . Il24l . 
||26l . ||27l is that X\t. has the nonzero mean X\t = iO,&(Tt^^t),®(t.T,]) G cMx(n-i) conditioned on 0. Let 
us decompose the channel matrix Xy into the mean Xy and the difference Xy — Xy. The problem would 
reduce to the zero-mean case if the two matrices were independent of each other, since the sum of two independent 
matrices with zero-mean i.i.d. entries is also a matrix with zero-mean i.i.d. entries. However, the two matrices are 
not independent while they are uncorrelated zero-mean matrices. Thus, we have to treat the influence of higher-order 
correlations carefully. 



ic 

E 

t = Ttr + l 
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The following proposition implies that the large-system results for each element of the error covariance matrix ( fT6l ) 
coincide with those for the case in which the two matrices Xy and Xy — Xy are mutually independent. In other 
words, higher-order correlations between the two matrices do not affect the results for each element of the error 
covariance matrix ( fT6] l in the large-system limit. Note that we do not claim the norm convergence ||Ht— ^^(r) Jm|| 
0. 

Proposition 4. Suppose that Assumption |7] and the RS assumption hold. Then, each diagonal element of the 
error covariance matrix ( Ii6l ) converges in probability to \28^ , defined by ^^5^ and ( 1561 1. in the large-system limit. 
Furthermore, each off-diagonal element of the error covariance matrix ( Ii6l ) converges in probability to zero in the 
large-system limit. 

Proof See Appendix IbI ■ 
Proposition m was rigorously proved without Assumption [T] for the unbiased case 9m,t = in |22|. Since we 
cannot claim the norm convergence ||Ht — ^^(r)/^/ 1| — ^ 0, a careful treatment of Ht is required in the analysis of 
the equivalent channel ( |20] |. 

We remark that the convergence of each off-diagonal element to zero results from the fact that the MMSE estimate 
hi of hi and the error hi — hi are uncorrected with each other. In fact, we can show a stronger result for the 
off-diagonal elements without the replica method. 

Lemma 1. Suppose that Assumption\l\holds. For a constant A > 0, each off-diagonal element (St)m^m' (^tt, ^ rh') 
of (ESI satisfies 



f3/4 

where the limit denotes the large-system limit. 



limsupM ' \{St)m.m'\ = A in probability, (51) 

Af->oo 



Proof We use the fact that the covariance matrices Ht and I —St for hi — hi and hi are positive definite. Let 
{Am > : TO = 1, . . . , M} denote the eigenvalues of H^. The positive definiteness of J — Ht implies 1 — > 
for all TO, or, < A™ < 1 for all to. This observation implies that Im — 3j is also positive definite for any fc G N, 
or 

limsupi-Tr(Hf) < 1 (52) 

J\/-s-oo 

in the large-system limit. Note that Assumption [T] implies the left-hand side of (|52] | tends to the expected one 

M-iTr(E[Ht*^]). 

In order to prove Lemma [l] we evaluate Tr(E[H^]). Let pt denote the strictly upper triangular elements of 
E[S,t]. A direct calculation implies that the the leading term of Tr(E[Ht]) is given by |ptpM'*(2|ptp - 3?[pj])/3 
in M — > oo. Applying this result and 3?[pj] < |pf P to ( |52] l. we have 

limsupM^lptl^ < 3, (53) 

M—^oo 

which implies that Lemma [T] holds. ■ 
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Lemma [T] implies {at)m.m' = 0{M ■^/"') in the large-system limit. We believe that it is possible to prove 
(3t)m,m' = O(M-i) by calculating Tr(E[Hf]) and taking k — oo after the large-system limit. However, Lemma [T] 
is sufficient for deriving Proposition [T] 

C. Analysis of Detector 

We focus on substage m within stage t and analyze the equivalent channel ( l20b in the large-system limit. It is 
shown that the equivalent channel reduces to a MIMO channel with perfect CSI at the receiver in the large-system 
limit. Let aj'^' G c(Jif-m+i)x(M-m+i) denote the posterior covariance matrix of {h^,^, . . .,hi,MV ^ C^^^^+i 
given It, i.e., the bottom-right block of the error covariance matrix ( fTSI l, 

(54) 

The equivalent MIMO channel with perfect CSI at the receiver is defined as 



a~^^^Jl-S.[^^X[,^^Mlt + w, (55) 



with w ^ CA/'(0, CT^/j\/_„i+i)- In (l55] l. the matrix \J I — 3j denotes a squared root of 7 — 3^ , i.e., I — Hj = 

The equivalent channel between Xm,t and the associated decoder for the MIMO channel (|55] | with perfect CSI 
at the receiver is given by 

P{im.t\^m,t,'^t^\d[m,M],t) 

P{xm.t = im,t|^,3r\^[m3/],t)pU|3i''\a;[m,A/],t)p(a;(m,Af],tl^(m,M],t)c?a'(m,M],trf^, (56) 

where p{xm,t\z,a^t\d^m^M],t) represents the pdf of Xm,t conditioned on z, 'B^t \ and 0[m,A/],t. 

Proposition 5. Suppose that Assumption\2\and the RS assumption hold. Then, the equivalent channel ( 1201 ) converges 
in law to the equivalent channel ( 1561 ) for the MIMO channel ( 1551 ) with perfect CSI at the receiver in the large-system 
limit. In evaluating ( [56l ). the variance of w is given as the solution to the fixed-point equation, 

a^=No+ lim -^TriSt) + V{a^), (57) 



with 

where {x[„i M],t) denotes the mean of X[,n,M],t with respect to the posterior pdf p{x[„i M],t\z,s['^\6[„i ]^j^ t). If 
there are multiple solutions, one should choose the solution minimizing the following quantity 

(59) 

where I{x[„^]^.J^t^,z) denotes the mutual information between a;[m,A/],t and z given realizations of at and 6\^„ij^^f 
Proof: See Appendix Icl ■ 



Jim ^J{x[„,,M\,uz) + - 
Af->oo M a 



i?2(A^o|k^)+ lim i^Tr(H,) 

A/-i.oo cr^M 
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We have implicitly assumed that the equivalent channel ( |56l ) and the last two terms in ( |57l ) converge as M — > oo. 
This assumption is justified below by using Proposition |4] and Lemma [T] 

Proposition |5] implies that the mutual information /(xm,*; im.tl^t, a^[i,m),t; ^[m,A/],t) tends to the constrained 
capacity I{xm.t', S;m,t\'^'f\ ^[m,A/],t) of the MIMO channel ( fSSl ) with perfect CSI at the receiver in the large-system 
limit. In order to complete the derivation of Proposition [T] we show that I{xm,t', S;m.t\'^t^\ S[m.M].t) tends to the 
integrand in (l34b . A proof of this statement is given in Appendix iDl One expects that if the convergence of each off- 
diagonal element of the error covariance matrix ( fT6] l to zero is fast enough, the off-diagonal elements of the channel 



matrix y I - ^l"^ for the MIMO channel (|55]) with perfect CSI at the receiver are negUgible. Thus, the MIMO 
channel (l55l l with perfect CSI at the receiver is decoupled into the bank of the AWGN channels dSTT l. The proof 
presented in Appendix ID] implies that the convergence speed shown in Lemma [T] i.e., (sj'^^)™.™' — 0{M^^/^) 
for ifi 7^ rh' is fast enough. In order to explain this argument intuitively, we apply the matched filter (MF) 



-H 

r — a^^^^\/ 1 — Hj^'' z for the received vector z of the MIMO channel ( fSSl l with perfect CSI at the receiver, 

r = -^Xm.t + V -^Xm',t + r], (60) 
a — ' a 

?n'— m+l 

with r) ^ CJ\f(0, (7^(7— h|'^'') /a). In ( l60t . the vector ^, denotes the (m' — m+ l)th column vector of J — sj'^'' for 
m' = m, . . . ,M. Note that the MF output vector ( [60l ) contains sufficient information for the estimation of a;[„jvf] j. 
The magnitude of the inter-stream interference, given by the second term of the right-hand side in ( |60l ). would be 
proportional to the product of (M — m) and the magnitude of each interfering signal if a constructive superposition 
of all interfering signals occurred. However, it does not occur due to the independency of data symbols with high 



probabiUty. On average, the magnitude of the inter-stream interference is proportional to the product of y/ AI — m 
and the magnitude of each interfering signal. Since the magnitude of each interfering signal is 0{M^'^^'^), the 
magnitude of inter-stream interference is 0(M^^/^). Thus, the inter-stream interference is negligible in the large- 
system limit. 

We have so far presented the derivation of Proposition [T| Finally, we discuss the performance degradation caused 
by using the LMMSE channel estimator. Let us consider the estimation problem for the first row vector hi of H 
based on the first row vector of the received matrix ^\t, instead of Yy. It is worth noticing the similarity between 
this problem and the detection problem of x,n^t in stage t. This similarity allows us to analyze the performance of 
the optimal channel estimator (|9| in the large-system hmit. 

Proposition 6. Suppose that the error covariance matrix for the optimal channel estimator (|2l) is self-averaging 
in the large-system limit. Under the RS assumption, then, each diagonal element of the error covariance matrix 
converges in probability to the same value as that for the LMMSE channel estimator, defined in Proposition |4] in 
the large-system limit. 

The derivation of Proposition |6] is omitted since it is straightforwardly derived by combining the methods for 
deriving Propositions |4] and |5] Proposition |6] allows us to expect that the gap between the achievable rate of the 
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optimal SD receiver and its lower bound ( [34l i may be quite small in the large-system limit, although we cannot 
immediately conclude that the lower bound ( |34| ) is tight in the large-system limit. 



Appendix B 
Derivation of Proposition H] 



A. Formulation 



It is sufficient from Assumption [T] to show that the averaged quantities — M ^ ^^^i^[{St)m,m] and 



Pt = (M-l)-^E 



L[{St)i.m] converge to ^^(r) and zero in the large-system limit, respectively, in which M, 



Tc, Ttr, and t tend to infinity while /3 = M/Tc, tq ^ Tt^/Tc, and r = t/T^ are kept constant. 

For notational convenience, hereinafter, we drop the subscript 1 in ( |50] l from all letters. For example, y•y^ i and 
hi are written as and h, respectively. Let li ^ = {h!{ \ ■ ■ ■ , h^^j) G C^^*^ denote replicas of h for a e N: 
{li ■* : a G N} are i.i.d. random vectors drawn from p{h). Furthermore, we write h SiS h} ^ — {hf \ . . . , h!"^). 
The replica analysis is based on the following lemma. 



Lemma 2. Let us define a function Zn{i^] /) as 



, (61) 



with a complex function f of {h }. In l\61i , p{y\^^\h, X\f) represents the virtual MIMO channel jSOj . For n> 

and Lu £ R, 



Id 
1 d 

pt = lim lim — — In Z„ (w ; /2 ) , 



(62) 
(63) 



where the functions fi and fi are given by fi — M ^ X]rn=i /™,™ '^^'^ ~ {M — 1) ^ X]rn=2 /i-ni' respectively, 
with 

f,n,,, = {h^y4Y)ih'i]-hi'jr. (64) 

Proof We only present the proof of ( |62] l since the proof of ( |63] l is the same as that of ( |62] i. Let /i^ e (j^ixAf 
denote the first row vector of the LMMSE estimate (fTsT l. i.e., the mean of h with respect to p{h\It). Then, we 
have 



1 

M' 



-E 



(65) 



where we have used the fact that the error covariance matrix ( fT6] l is the posterior covariance of h given Xf The 
introduction of a non-negative real number n gives 



?7= lim I 

n— >+0 



It is relatively easy to confirm that ( |62] i is equivalent to ( |66] l. since Z„(w; /i) = 1 in n, a; ^ 0. 



(66) 
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It is difficult to evaluate ( |6T] ) for a real number n. The main trick of the replica method is that n is regarded as 
a non-negative integer in evaluating (1611 1. For n = 2, 3, . . ., we have a simple expression of (|6TI ), 



/■ " ^ ^(a) 



(67) 



In order to use Lemma |2l we have to take the operations with respect to a; before the large-system limit. However, 
we need to take the operations after the large-system limit, since it is possible to get an analytical formula of 
( l67T i only in the large-system limit, as shown in the next section. We circumvent this dilemma by assuming the 
commutativity of the large-system limit and the operations. 



Assumption 3. For a non-negative integer n. 



1 d \ 

lim lim — 7— lnZ„(aj; f) = lim —- lim — lnZ„(a;; f), 



(68) 



where limA/^.oo denotes the large-system limit. 



An analytical formula of WT\ obtained in the large-system limit is not generally defined for n > 0. In order to 
predict the correct asymptotic formula of (|6TI) in a neighborhood of n = 0, we will assume a symmetric statistics 
with respect to replica indices, called RS. Assuming that the order of the large-system limit and the operations 
with respect to n and w in ( |62l ) and ( |63] l is commutative, we obtain analytical expressions of (|62] i and ( |63] ) in the 
large-system limit. It is a challenging open problem to prove whether these assumptions are valid or whether the 
obtained result is correct. 



B. Average over Quenched Variables 

In this section, we evaluate the expectations in ( |67] | with respect to = (^Ttj 1 ^(Ttr,t)i ®(t,Tc]) ™d ©(Tt^ ty 
The matrix Xy consists of three kinds of random vectors: {xt'} for i' = 1, . . . , Ttr are the pilot symbol vectors, 
{ajf} for t' — Ttr+i, . . . ,t — 1 are the data symbol vectors decoded in the preceding stages, and {thetaf} for 
t' = < + 1, . . . , Tc are the bias vectors for the data symbol vectors unknown in the current stage. Since the elements 
of j/w, given by (fSOl) . are mutually independent conditioned on "H = {li ' : a = 0, 1, . . .} and Xw, (|67] | yields 



Tt, 



with 



t-Ttr-l 



r 



e„(w(°\cr2,?^) =E 



, (69) 



(70) 



where g{y] ) denotes the pdf of a proper complex Gaussian random variable y E C with mean v^°-' and 



variance cr^. In ( |69] l. cTj, is given by a^, — M ^J2^=i l^m.t'P- Furthermore, Wp"-* e 



(q) 



and Wc"'' e 



are given by 



A/ 



m— 1 



(71) 
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M 

m— 1 
M 



(a) 



'^m.t — l 1 



M ^ , 

m— 1 



(a)/) 



(72) 

(73) 



respectively. 

We first evaluate en{v^\ Nq^H) in the large-system limit, following ll24l . fTT) . Calculating the Gaussian 
integration with respect to y, we obtain 



E 



n 



(7rA^o)"(l + n) 



with Vp = (fp"'', . . . , Vp''')'^ and 



n 
1, 



(74) 



(75) 



(1 + n) 

In M oo, due to the central limit theorem, Vp conditioned on H converges in distribution to a circularly 
symmetric complex Gaussian random vector with the covariance matrix PQ, given by 

1 



(76) 



with hm = {h. 



(0) 



,4:^)^. Thus, 



(77) 



er^{vl-\No,n) - exp |g (^^Q^ | + 0{M-'), 
in the large-system limit, in which the function G{Q) is given by 

G{Q) = - hidct(J„+i + AQ) - ?iln(7riVo) - ln(l + n). (78) 
We next calculate en{vc'^\ Nq + P — a'l,,'H). Expanding it with respect to the difference af, — ag, we have 

e„ iVo + P- al,H) = e^{vi''\No + P-al,n) + 0{M-^'^), (79) 



in the large-system limit, since the standard deviation of cr^, — is 0{1/ \/ M). In the same manner as in the 
derivation of dTTb . we have 

e„(z;('^),iVo + P- 4,n) = cxp |g (^^^—^l—^Q^ | + 0(Af-i/2). (gQ) 
The quantity e„(wj"'', A^o, "H) is different from the other two quantities since = iv'-^\...,v[^^f has the 

/ (0) 

nonzero mean ve = [Vg , 



,4"^)^, with 



(a) 



M 

7\f 



(81) 



The difference t>d — conditioned on H and 0f_i converges in distribution to a circularly symmetric complex 
Gaussian random vector with the covariance matrix = PQ — X]ri^=i \(^m.t-i\^hmh^ in the large-system 
limit. We first take the expectation with respect to Xt-i to obtain 



n 



O(M-i), 



(82) 
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with 



B{Q,A) = Q^ Q \A + Q 



Q 



(83) 



In order to eliminate the dependency of 6m-i on Q^, we use || Q^j ^ {P ^ <^0)Q\\ = 0(1/ vM) in the large-system 
limit. Expanding the exponent in (|82] | around — {P — o'g)Q, we obtain 



Nn 



n 



o{Ar 



-1/2N 



(84) 



where we have used the identity B{Q,Nn ^A) = ^B{Nq ^Q,A). Applying the central limit theorem with 
respect to vg to (|84] |. after some calculation, we arrive at 

P 



eM;\No,H) = exp ^ G ( ) ^ + 0{M-'/'). 



(85) 



It is interesting to compare (|77] i for the pilot symbols and dSSl l for the data symbols decoded in the preceding 
stages. These expressions imply that random biases do not contribute to the performance of channel estimation in 
the leading order. 

We substitute ^1}, and (HD into to obtain 



in the large-system limit, with 



1 - r 



-G 



13 - yNo + P-a, 
Differentiating ( |86] | with respect to ui and using Assumption [51 we have 



1 d 

lim lim — -— In Z„(w: f) = lim 



E 


J mi ,7712'^ 


E 


qMG{Q) 





(86) 



(87) 



(88) 



with /,„i,„i2 = for / = /i and /,„i,„i2 = /i,2 for / = /2. Expression ([88]) implies that the problem of 
evaluating (|62] | and (|63] | reduces to that of evaluating dSSl l for / = fmi,m2- 



C. Average over Spin Variables 

In this section, we take the expectation in dSSl ) with respect to H, following f53\. For notational convenience, 
we define a set = {toi,TO2} of integers. We first evaluate the conditional pdf of Q given Hm = {hm ■ 

rh e M}. 



/.(Q) =E 



/ M 



(89) 



It might be possible to obtain the analytical expression of the pdf ( [89] l since Q is a Wishart matrix. However, we 
derive an asymptotic expression in the large-system limit by using the inversion formula for the moment generating 
function F{Q) of Q given by 

F{Q) = E f e*^T^(«^) Hm] , (90) 
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where a positive definite (n+1) x (n + 1) Hermitian matrix Q is given by 



Q = 



qofi 
1 ~* 

2% A 



\ Wo, 



990,1 



In 

2 Hn—l.n 



2 Q'ri— 1,1 



(91) 



The inversion formula for moment generating functions implies 



(92) 



with dQ = ]Ta=Q'^ia,aY{a<a'{'^'^\-^a,a']d'^[qa,a']}- In the integrations with respect to dqa,a, d!R[qa,a'], and 
(i9[ga.a'] are taken along the imaginary axes from — joo to jcx), respectively. Since {/im} are i.i.d. for all to, the 
moment generating function ( |90] l reduces to {Fi(Q)}^^^''^' IlniGA^ exp(/i^Q/im), given by 



J^i(Q) 

The substitution of this expression into ( |92] ) gives 



n 



with 



/(Q,Q) = Tr(Qg) - ( 1 - InFi(g). 



(93) 



(94) 



(95) 



In order to obtain an analytical expression of (|94] |. we use the saddle-point method. Let us define q € R("+^) 
as q = {Qo, ■ ■ ■,qnV, given by = {qa,a,^[qa,a+i],^[qa,a+i], ■ ■ . , 3fJ[ga,n], 3[ga,n])^ e R2("-")+i. Expanding 
( [95T l with respect to Q around the saddle-point 



Qs= argsup lim I{Q,Q), 



with A^^_^i denoting the space of positive definite (n + 1) x (n + 1) Hermitian matrices, we have 

{n+lf 



(96) 



/i(Q) 



27r 



^hgQ,h,n^-MIiQ.Q,) 



(r.+ l)2 



exp<jiq^V|/(Q,QJq;« 1 + 0(1/VM) dq 



(97) 

where V|/(Q, Qg) denotes the Hesse matrix of I{Q, Q) with respect to q. In the derivation of Wt\ . we have 
transformed the variable Q into Q ~ \fM{Q ~ Q^)/] and then rewritten Q as Q. The Hesse matrix /(Q, Qg) is 
negative definite since the cumulant generating function \x\Fi{Q) is convex. Thus, we can perform the Gaussian 
integration in jW] ) to obtain 



[n+lf 



|det{V?/(Q,Qs)}r' n e''-<5.?^*e-*^^('?'^=) [l + 0(1/Vm) 



(98) 
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We next calculate the numerator in dSSl l by using the pdf ( |98] l. Substituting (|98] l into the quantity E[/„ij^m2e*^'^('5)] 
and then using the saddle-point method, we have 



E 



/mi,m2^ ^'^^ — C'n(Q,Qs)e 



1 + 0{1/VM) 



(99) 



with $((5) = /(g, QJ - G(Q) and C„(Q, QJ = | det{V?/(Q, QJIT^ det{V2$(QJ}-i. In (l99]i, denotes 
the saddle-point 

= arginf lim $(Q). (100) 

Furthermore, V^$(Qj,) represents the Hesse matrix of ^{Q) at the saddle-point Q = Q^, and is assumed to be 
positive definite. 

Similarly, we can obtain an analytical expression of the denominator in dSSl l. Substituting the obtained expression 
and ( |99l ) into dSSl l. we arrive at 



1 c) 

lim lim — -— lnZ„fw; f) 



(101) 



with = /i,i for / = /i and /^i^m^ = /i,2 for / = /2. 

The calculations of the stationarity conditions for (|96] | and ( llOOI l implies that {Qs,Qs) is given as the solution 
to the coupled fixed-point equations 



Q = 



E 


hi 






E 






1— ' 







/3(iVo + P- al) 



No + P-ai 



AQ A. 



(102) 



(103) 



D. Replica Symmetry 

The expression ( IIOII ) is defined only for n e N, since (n + 1) is the dimension of Q and Q. In order to obtain 
a formula of jlOlll defined for n G M, we assume RS for the solution to the coupled fixed-point equations ( 11021) 
and (fT03l l. 



Assumption 4. The solution (Qg, Qg) « invariant under all permutations of replica indices: 

bll 

^6*1„ (d-c)/„ + cl„l,^,\ 

a 111 

6*1„ (d-£)/„ + a„l^ 



(104) 



(105) 
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We first evaluate the fixed-point equation (|103t . Let us define (ctj^'')^, ct^j., (ctc"^)^, and cr^ as 

{a'-^f = No + P{a~b-b* + c), 
al^No + P{d-c), 

((jf") )2 ^ No + P-a^ + cT^ia-b-b* +c), 
al = No + P - al + al{d - c), 



(106) 



respectively. After some calculation, we obtain 

tP n 



6 = 



rP {<ri'Jr 



(0)n2 



{l-r)al (an 



We next evaluate the fixed-point equation jlOU by calculating e'^^*^'*'" with ( |107| l. 



(107) 



^h^Qh^ = exp . 



T 



;i2 



" L (a) 
a=0 If'^tr J 



a=0 l^tr J 



a=0 K^c ) 



(108) 

with = (ncj-.^ + and = {n<j-^ + {<ji"^)-^)-\ In (HOSj, (ct^"^)^, and (ct^^V are given by 

the identity 



(cr^J''')^ = a^j., and (ctc'^')^ = for a = 1, . . . , n. In order to linearize the two quadratic forms in (1108b . we use 



1 



■+a y+ay 



dy, 



(109) 



for y - e C or y = e C. Substituting (UMl) with (a, a^) = ( V^P/^ELo CV(<0', '^t' ) or (a, a^) = 
(v/(l - ^)^.V/3E:=o 4"V(<^^'^^)^ into (IIM), we have 

/ri 
n<z(yl4"Vy, (110) 

With Dr. = (^2^2^^2)„(i+„(^(o))2/^2j(l^„(^(o))2/^2)^ I„ gjoj, the function for y = {y.^.y^f G 

is defined as 



with 



— /i^'^^-a'"^ \ a\v 



1 iH-fel' 

^(y ft.;cr) = — 2^ 



(1-T)g^^(a)__(„ 



Applying the expression jllOl) to (I102l l. we arrive at 



a — 6 — 6* + c 



E 


/ 




%(y|/j(°)){E^a) 




} dy 


E 


/g(y|/xl°)){E,a) 




~i " 1 
} dy 





(111) 



(112) 



(113) 
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d — C = 



E 


K 




'^g(y|<)){E^a, 




} dy 


E 


/g(y|/i(")){E,^a, 


'<iiy\h['^) 


n " 1 

} 





with 





"/i«g(y|/i«)" 









(114) 



(115) 



E. Replica Continuity 

Equations ( IIO6I 1. dl 131 ). and dl 14b provide the coupled fixed-point equations of {a — b — b* + c, d — c) under the 
RS assumption, and are well defined for n € M. We regard n as a real number and take the limit n +0 to obtain 



(4"^ = 7Vo + PE 



al = No + PI 



,(1) 



iai'>^)^ = No + P - ctI + alE 



(116) 
(117) 
(118) 
(119) 



where the expectations for y are taken with respect to the measure p{y\h'f'^)dy. Note that E[|/i^^'' — (/ii^')!^] and 
E[|ft.^^^ - {h^p)\'^] depend on {a'^^^f, ct^., {a'^c^f, and cr^. Furthermore, the quantity ( fToTT l is given by 

E[|/.(")-(/^«)p] for/^A ^^^^^ 
for / = /2. 

Under the assumption of the commutativity between the large-system limit and the limit n +0, the substitution 
of (fnol i into ^ or ^ gives 



1 9 

lim lim — -— lnZ„(w: /) = 



lim ^ = E 



lim pt — Q 

Af-s-oc 



(121) 
(122) 



in the large-system limit. Note that we have implicitly assumed that the right-hand sides of (I121l l and (1122b . obtained 
by the replica method, coincide with the correct ones. 

In order to complete the derivation of Proposition]?] we show that ( 1121b reduces to ( |28] l. defined by the fixed-point 
equations dllll and dMll. Since /ij"' - C7V(0, 1), the quantities E[|/iJ°^ - {h^l^)\^] and E[|/i^^' - {h^p)\^\ reduce to 

2" 



= £M 1 



[a^'frP , (a(°))2(l-r)ar 



E 



with 



1 



(123) 
(124) 

(125) 
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Equations ( |117t . ( |119l l, and (11241 ) provide a close form for (0-4^, ct^). Furthermore, ( III6I 1. dUSt . and ( |123l l for a 



given solution {a^^,a'^) form two independent linear equations with respect to {(t[^^)'^ and {ac'')'^, and have the 
unique solution {{a[^^)'^, (ctc"^)^) = (0-4^, ct^ )■ These observations indicate that the averaged MSE (I121l l is given as 
J, defined by the fixed-point equations (l35l l and 



A. Formulation 



Appendix C 
Derivation of Proposition|5] 

[• • • ] denote the expectation with respect to Y\f and X[i „^yt given t, Xm,t, and 64. 
It is sufficient from Assumption |2] to show that E^^^ ^ Jp(J„i^t|xm^t,2t, a;[i 0[„j j\/]_j)], given by ( |20] l. 
converges in law to the equivalent channel (|56] | in the large-system limit. Substituting the posterior pdf ( [TtT i into 
(|20] i and then introducing a non-negative number n, we have 



Let E. 



E 



m.t \ ^m,t i^ti ^[l,rn),ti ^[m,AI],t) 



lim 



?7-!- + 



(126) 



with 



Zi^) = E^ 



p{yt\xt,It)p{x 

M],t)dX[jn,M]; 



n-1 



piyt\xt,it) 



Xp(i[m,J\/],t|0[m3f],O'^*(m,M],tP(yt|a;t,2:'t)p(3;(m,M],t|0(m,M],i)rfa;(m,M],trfyt 



(127) 



where we have introduced Xt = {{x[i^,n).t) , (i[m,M].t) ) , in which i[TO,Af],f = {xm,t, ■ ■ ■,iM,t) has the same 
statistical properties as x^^ ^^ f. Furthermore, i(m^M],t is given by i(,„^M],i = (^m+i.t, ■ • ■ , ^m.*)"^- Note that 
(|127t is a quantity of 0(1), while (I6TI 1 is a quantity of 0(c*^). Thus, we have to evaluate (|127t up to 0(1) in the 
large-system limit. 

Let us regard n in ( 1127b as a positive integer. For n = 2, 3, . . ., ( 1127b reduces to a special expression. 



Zi'^'> ^ E. 



\[p{yt\x[''\lt) n [p{x\^lM]^^[rn.M].t)dx\^l^,^^ ^^ 



a=2 



(1) 



.(1) 



(128) 



In ( 1128b . = ((a;[i,m),t)^: (^[m,A/],t)^)^ ^ denotes replicas of i( for a = 2, . . . , n, in which jvf],t = 
(a::^\, . . . , x^'^^'^)'^} conditioned on 9[„i^M],t ^le independent random vectors drawn from p{x[jn,M],t\(^[m.M].t)- For 



notational convenience, we have written x^ 



m,M],t 



and X 



m,M],t 



as X 



(a) 



fl^],t - i^nZ' ■ ■ ■ '^^m!*)^ a = and 

(a) 

■m+l,t' ■ 



is defined as x 



(a) 



a= 1, respectively. The vector aj^^^^,^ ^......^ a. ^J,* 

written a;t and Xt as ajj""* = ((a;[i,m).t)"'", (a;[m mj J"^)"^ for a = and a = 1, respectively. 



'^m'**)"^- Furthermore, we have 
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B. Average over Quenched Variables 

We first calculate the integration in ( I128I I with respect to y^. The substitution of ( fTSl l into ( 11281) gives 



n 11 



a=0 



a=2 



(129) 



where ff^") = ■ ■ ■ , (h^N^V e C^^*^ denotes rephcas of for a = 0, ... , n: {H'-''^ conditioned on 

it are mutually independent random matrices drawn from p{H\It). This expression is useful since the covariance 
matrix of p{yt\H^°'\x'f''') does not depend on x['^\ while the covariance matrix of p{yt\x[°'\it) depends on 
x[°'\ Using the fact that the row vectors {hn'^} of Jf^"-' are mutually independent, we obtain 



with 



I n' = l " L a=0 ^ ^ > , 

m-\ M 

'-^"■n',m',f^rn'.t "t" / ^ "■n',m"^m', 



■^m,t^ •^m,t^ ^ \ti 



(a) _ 



(130) 



(131) 



where h^^^t and A/i^, ^ denote the (ri',TO')th element of H^""^ and the LMMSE estimation error A/i|" ^, ^ = 
^n'\n' ~ hn',m',t, respectively, with hn',m',t denoting the (7i',m')th element of the LMMSE estimate dTSI) . In 
( 11301) . the expectation E • •] is taken with respect to the measure p{hL^} \it)dh\J . In the derivation of (11301) . 

I n' J 

we have eliminated the bias b — M^^/'^Y^^^Vli hn\m',tXm'.t known to the receiver by transforming {yt)n' into 
V — {yt)n' — b. Performing the Gaussian integration with respect to y, we have 



n 

n' = l 











[irNoY^il + n) 



•^m,tT'^m,t^ ^ \ti 



(132) 



with Vn 



, vl^^j^ j)-^. In ( 1132b . the matrix A is given by ( iTSl l. 



We next calculate the expectation in ( |132t with respect to }. Since „j/} conditioned on It are proper 
complex Gaussian random vectors^ the random vector Vn'.m,t conditioned on Xt ~ {a^t°^ ■ for all a} and It is 



also a proper complex Gaussian random variable with mean 



u 



n' ,ra,t 



m —m 



(133) 



and with the covariance matrix D = M ^diag{(cCj'^^)^HfCCj'^\ . . . , (a;j"'')-^Hta;j"''}, in which a;„i',t G C"+^ is 



„(«)i 



given by a;„ 



•^m'^i)^- l'^ '^1'^ same manner as in the derivation of ( |82] |. we take the expectation 



with respect to v„',m,t conditioned on Xt and It to obtain 



(d) 



■ P{^m]t\^rn.t) 



N 



•^m,t^ ■''m,tJ ^ \ti "t 



-Q gG(JV-iD)-«^,_,„^B(r>,Af-iA)«„, ,„_t 
ri' = l 

where G(Q) and B(JD,iVo"^A) are given by dTSl l and (l83T l. respectively. 

* We could not immediately conclude the Gaussianity of v„i „ ^ if the optimal channel estimator j9) was used. 



(134) 
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Finally, we evaluate the expectation in ( |134| l with respect to Y'^f. Expression ( fTSl l implies that the LMMSE 
estimates {(/in',m,t, ■ • ■ jhn'.M.t) ■ for all n'} conditioned on X\t are mutually independent circularly symmetric 
complex Gaussian random vectors with the covariance matrix I \ Thus, the vectors {m„/ m,t} conditioned on 
X\t and Xt are also mutually independent circularly symmetric complex Gaussian random vectors with covariance 
mUn',m,t)aiun',m,t)*a>\X\t,Xt] = A/" ^ (a;[^ '^^j _ (/ - ^ )a;[^'_^^j for all n'. Taking the expectation with 
respect to I'y, after some calculation, we have 

where the (n + 1) x {n + 1) Hermitian matrix is given by 



[m,M],f 



(135) 



(136) 



C. Average over Spin Variables 

In order to evaluate the conditional expectation in (1135b . we evaluate the pdf of conditioned on x. 



Ht, and Ot- Let us define the function /d((5^,Q(j) as 



/d(Qd,Qd) = Tr(QdQd) - lim — InFd(Qd), 



with 



E 



St, 



(1) ^(0) 



(137) 



(138) 



where a positive definite (n + 1) x (n + 1) Hermitian matrix is defined in the same manner as ( 1911 1. In (11371 ), we 
have implicitly assumed that the limit in the right-hand side of (|137t exists. Furthermore, we define the saddle-point 
Qd as 



argsup IdiQd,Qd)- 



(139) 



We represent the pdf /i(Qd) of Qd conditioned on St, and 0t by using the inversion formula for the 

moment generating function of Q^, given by 



Using the saddle-point method in the same manner as in the derivation of (|98] l gives 

M(Qd) 



^) |det{V^ Jd(gd,Qd^}r'e"*^^^('3-'3^^')+o(*^"^)[l + 0(M-i/2)], 



in the large-system limit. In ( I141l i, the function /d(Qd' Qd) is given by 

IdiQd^Qd) = Tr(QdQd) - ^lnFd(Qd)- 



(140) 



(141) 



(142) 



Furthermore, V?^ Id{Q,Qd) denotes the Hesse matrix of ( 1142b with respect to Qd- 

The factor 0(A/~^) in the exponent in ( 1141b is due to a small deviation of the saddle-point (|139b . The removal 
or addition of one transmit antenna results in a small change of MTr(QdQd)' more precisely, in a change of 0{1). 
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This observation implies that I^iQ^, Q^) — /d(Qd, Q^) + 0{AI^^) in the large-system limit. Differentiating both 
sides with respect to at the saddle-point ( |139l l. we find that the gradient Vq /d(Qd, q|j '') of ( I142l i with respect 
to Qd at the saddle-point is 0{M^^), which explains the factor 0{M^^) in the exponent in ( 11411 ) since a deviation 

- (s) 

of the saddle-point results in a deviation of the exponent which is proportional to M|| Vq^ Jd(Q(j, 

We repeat the same argument to evaluate ( 11351 ). Applying ( 11411 ) to ( I135l l and using the saddle-point method, we 
arrive at 

4'^^-p(^^^!*lC*)^^i?HQi^^Qd^)e-*'*^^''''ni + 0(M-^/^)]. (143) 
In (11431) . the function ^d{Qd) is defined as 



The saddle-point is given by 



arginf ^d(Qd)> 



with 



MQci) = Wei, ) - a-'GiN^'Qd). 



Furthermore, C^^Q^^Q^) is defined as 



CL^\Qd,Qd) = |det{V|^/d(Qd,Qd)}r'det{a-iv2j^G(7Vo-iQd)}-\ 



(144) 



(145) 



(146) 



(147) 



where '^q^G{Nq^Q^) denotes the Hesse matrix of G{Nq^Q^) with respect to Qd- Note that we have assumed 
the positive definiteness of the Hesse matrix '^q^G{Nq^Q^) at the saddle-point = 

The calculation of the stationarity conditions for ( |139l l and ( 1145b implies that Qd ^) is given as the solution 

to the coupled fixed-point equations 



Qd — li™ 

A/-i.oo 



E 


g^gAfTr(Q,Q,) 


3t, 6t 


E 


gA/Tr(QdQJ 


3t, 





a 



-1 / A 

In+l + ^Qd 



(148) 



(149) 



D. Evaluation of Fixed-Point Equations 

In order to evaluate the coupled fixed-point equations ( 1148b and ( 11491) . we assume RS. The assumption of RS is 
consistent with Assumption|2] i.e., the assumption of the self-averaging property for the equivalent channel (|20|) ||45l. 

Assumption 5. The solution {Q^^^^ , Qd ' ) invariant under all permutations of replica indices: 



fld 



b*^ln (dd - Cd)/„ + Cdl„l^ , 



(150) 



Q? 



Od 



6dir 



6jl„ (dd - Cd)/„ + cdl„l^ 



(151) 
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We first evaluate the fixed-point equation (|149t . Let us define ctq and as 

o-Q = ^0 + (fld - fed - &d + Cd), 

respectively. After some calculation, we obtain 



Od = 



6d = 



Cd = 



(a2 + na2)a2 



dd = Cd 



We next calculate the quantity exp{A/Tr(Q|j^''Q|i "*)}. Let \l I — "^t^^ denote a square root of I 



l a 



(c) 



(152) 
(153) 

(154) 



-(c) • 

al ', I.e., 



(c) 



/ — H( . The substitution of ( 1154b into that quantity gives 

2 



E 



j_„{c} (a) 



a=0 



r _ w('=)-r(''^ l|2 



+M^'^Y^t4^ + rfd Y^ix^'^Y^A"'' , (155) 



with (Tq = {na + (t^'^) ^ and a"^ — cr^ for a = 1, . . . , n. In order to linearize the quadratic form in ( |155l l, we 
use the identity 



with a = Y.2=o \Jl- St''^KAf],t/(Va<^a) to obtain 



2^ 



,MTr(Q(='Q<=') 



P n 



(156) 



(157) 



with D^n^ = (7rcr2)"(A/-™+i)(i + ^cr2/^2)Af-m+i^ (fBTI l. the functions goU|a5j"\Ht) and gaU|a;f\Ht) for 



(0) 



a = 1 , . . . , n are given by 



) 

■[mM]d 



(158) 
(159) 



respectively, where qa{z\x)'^ j^j^ j,H('^-') represents the pdf of a proper complex Gaussian random vector z a 



"[m,M],t 



a and covariance (J^I, i.e.. 



-exp 



\z- x/I-Si^'^x^'''^ 



t •^[m,M].t 



/V^ll 



(160) 



(7rcr2)M-m+l 

Finally, we evaluate the fixed-point equation ( I148I I. The substitution of ( I157l i into (11481 ) gives expressions of 
fld ~ fed ^ fed + Cd and dd — Cd well-defined for n e M. Taking n — +0, we have 

.(0) 



lim^(ad - fed - feS + cd)= Jrn^ | J qo{z\x]2,Mlv 
X ( {xfY^^xf^ + 



dz 



3t,0t ^(161) 
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lim (dd-Cd) = lim -^E | j qo{z\xf 



.M].V 



with 



,(1) 



[m,M],t 



,(1) 
'[m,M],t 



)) 



,(1) 



,(1) 



Ht,0t^ (162) 



(163) 



Substituting these expressions into ( 11521 ) or ( 1153b . we have the coupled fixed-point equations 



.,M],t 



,M],t 



)) 



where the average over z is taken with respect to the measure 9o(:Z|3;[°j ^.^j j, a['^^)dz. 

The coupled fixed-point equations (1164b and (1165b have the solution CTq = cr^. Nishimori's result 
that CTq = cr^ is the correct solution. Assuming (Jq = a'^, we have the single fixed-point equation 

a' ^ No + Vim LTr{St) + V{a'), 

M—>-oo M 



(164) 

}■ 

(165) 



implies 



(166) 



with 



V(a^) = lim — E 



.(1) 



.(1) 



(167) 



In ( 1166b . the average over z is taken with respect to the measure qoiz.lx'^^ j^^j ^, a'f'')dz with ctq = cr^. Furthermore, 



,M],t/ 

given by 



denotes the expectation of x 



(1) 



^,^] J with respect to the posterior measure qi{x'^[2,M],M' ^t''')d'x''im,M].v 



9l(^[m,Ml,tl-' ^'f \^[m,M],t) - 



<ll{^\X[2,M],t^ "t ')P(^[77i,M],t\^mit) 



(1) 



(168) 



Note that the fixed-point equation (11661) is equivalent to (l57b . 



£. Replica Continuity 



We evaluate ( 11431) under the RS assumption (Assumption |5]). The function G{Nq ^Qd'^'), given by dTSb . reduces 

to mil 



(169) 



which is well defined for rt g R and tends to zero in n +0. 
Applying ( 11521 ). ( 11531 ). and ( 11541 ) to ( 1142b . we obtain 



1 - 



-^lnPd(Q?). 



(170) 
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Substituting ( |157t into the moment generating function ( 11401 ). we have an expression of ( |170t well defined for 
n e K. Taking n +0, under the assumption of ctq — a'^, we have 

=-]^ln J qi{x^^l\z,s[^\e[rnMht)qo{z\x'f^^^^^ (171) 

with the marginal ql{x':^]^\z,s[''\e[^^M].t) = J qiix'^[l]^M]M''^t''^ '^\-'^'^^^-^^'^^[liM],t °f Substituting 
( |169l l and (1171b into (1144b and assuming that the obtained expression is correct in n — > +0, from (11261) . we arrive 
at 

p{x 



X J Qlixin'M'^t''\d[rnMht)qo{z\x''^j^,j^^ (172) 

where we have assumed that the large-system limit and the limit n +0 are commutative. Due to the normalization 
of pdfs, the quantity C,l'^''(Qj^\ tends to 1 in ?i — > +0. This observation implies that the right-hand side of 
( 1172b is equal to the equivalent channel ( l56b between Xm.t and the associated decoder for the MIMO channel 
with perfect CSl at the receiver 



F. Multiple Solutions 

The fixed-point equation ( 11661 ) may have multiple solutions. In that case, one has to choose the solution minimizing 
the quantity ( 11461 ). Due to lim„^+o ^d(Qd ^) = 0, the quantity $d(Qd) is given by <^d{Qd ^) = nF + 0{it?) 
in n — > +0, with the so-called free energy F = lim„^+o ^^'d(Qd''')- Thus, one should choose the solution 
minimizing the free energy F. 

In order to calculate the free energy F, in the same manner as in the derivation of ( 1170b . we evaluate ( 1137b as 



1 



2 (t2 ((t2 + ncr^) 



lim In D^^) 

M->oo M 



lim i-ln / E^(o,[goU|a;i°\Ht)] f E^<i, [giUja^^^ Ht)]|" d^. (173) 



We differentiate ( 1 169b and ( 11731 ) with respect to rt at n = to obtain 



A/-i.oo M ^ ' a 



1 



i^c(A^o|k2)+ lim — ^Tr(Ht)+ln(^eA^o) 

M^oo Ma 



with 



X In ■ 



Minimizing (1174b is equivalent to minimizing 



-da; 



(0) 



,A/],t 



(174) 



(175) 
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Appendix D 

Reduction of Proposition[5]to Proposition [T] 

Let us prove that the fixed-point equation ( ISTT i coincides with the fixed-point equation (|37] |. We first show that the 
last term in (|37] | is a lower bound of the last term in ( |57] |. by considering the MIMO channel dSSl l with additional side 
information. Let a genie inform the receiver about the correct values of the data symbols X{^m,M],t- The MSB (ISSl l 
for the genie-aided receiver should provide a lower bound of the original one. In order to eliminate the inter-stream 
interference from the MF output vector ( |60l l. the genie-aided receiver calculates — r — X]m'=m+i m'^ra',t/ot, 
given by 

'^s = -it ni^m.t + V- (176) 

a ' 

The performance of the interference-free channel (11761 ). such as the MSB and the constrained capacity, is determined 
by the SNR 

Pll^ l|4 
snr= „ W^UrnW 

Proposition m and Lemma [T] imply that the numerator and denominator in ( 11771 ) are given by P||^f „J|"' = P(l — 
C2(r))4 + 0(M-V4) ^jjj aa^i"^{I - s[''^)it.i ^ acr^(l - + 0{M-^/*) in the large-system limit, 

respectively. Thus, the SNR ( 1177b converges in probability to snr = (1 — ^^(t))P/ {aa^) in the large-system limit, 
which coincides with the SNR for the AWGN channel dSTl i with (T^(t, /i) = cr^. This expression implies that the 
last term dSST l in the fixed-point equation (ISTT i is bounded from below by (1 — ^)(1 — ^■^(r))E[MSE((T^, 0,n,t)] in 
the large-system limit. 

We next prove that the last term in (|37] | is an upper bound of the last term in i5% . Let us consider a suboptimal 
receiver, which estimates Xm.t only from the first element of the MF output vector ( l60l l, given by 

rm= ^ (178) 

a — ' a 

7n' —'m-\-l 

with 7]„i denoting the first element of rj. In order to evaluate an upper bound of the MSB (l58l l for this suboptimal 
receiver, we replace the inter-stream interference in ( 1178b by the AWGN with the same variance. The MSB (ISSl l 
for the obtained channel provides an upper bound of the original one, and is determined by the SNR 

snr = ^\ ^ * ^'"""^ , (179) 

which converges in probability to snr = (1 — £^'^{T))P/{aa'^) in the large-system limit, due to Proposition |4] 
and Lemma [T] This result implies that the last term (|58] | in the fixed-point equation (|57] | is bounded from above 
by (1 — — ^^(T))E[MSE(cr^, 6'„i.t)] in the large-system limit. Combining the two bounds, we find that the 
fixed-point equation ( fSTl i is equal to the fixed-point equation {3T\ . 

The argument described above implies that the inter-stream interference is negligible in the large-system limit. It 
is straightforward to confirm that the mutual information I{xm.t', S;m,t\^t'^\ (}[m,M].t) converges to the constrained 
capacity of the AWGN channel dSTb . i.e., the integrand in (|34] |. by repeating the same argument. Similarly, it is 
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Straightforward to find that ( |59] l is equal to (|38] l. Combining these results and the argument described in Section lA-AI 
we find that Proposition [T] holds. 
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